People capture photos and videos to relive and share memories of personal significance. Recently, media montages (stories) have become a popular mode of sharing these memories due to their intuitive and powerful storytelling capabilities. However, creating such montages usually involves a lot of manual searches, clicks, and selections that are time-consuming and cumbersome, adversely affecting user experiences. To alleviate this, we propose task-oriented dialogs for montage creation as a novel interactive tool to seamlessly search, compile, and edit montages from a media collection. To the best of our knowledge, our work is the first to leverage multi-turn conversations for such a challenging application, extending the previous literature studying simple media retrieval tasks. We collect a new dataset C3 (Conversational Content Creation), comprising 10k dialogs conditioned on media montages simulated from a large media collection. We take a simulate-and-paraphrase approach to collect these dialogs to be both cost and time efficient, while drawing from natural language distribution. Our analysis and benchmarking of state-of-the-art language models showcase the multimodal challenges present in the dataset. Lastly, we present a real-world mobile demo application that shows the feasibility of the proposed work in real-world applications. Our code and data will be made publicly available.
translated by 谷歌翻译
通过强化学习学习面向任务的对话策略通常需要与用户进行大量互动,实际上,这种方法对于现实世界应用程序无法使用。为了减少数据要求,我们建议从不同的对话框域中利用数据,从而减少每个给定域所需的数据量。特别是,我们建议学习域 - 不足的动作嵌入,该嵌入范围捕获通用结构,该结构为当前的对话框上下文提供了为系统提供信息,然后专门针对特定的域。我们展示了这种方法如何能够与用户的互动相互作用明显较少,而学习所需的对话数量减少了35%,并且比培训一组模拟的每个域的单独策略要比培训单独的策略更高的水平。域。
translated by 谷歌翻译
由于面向任务导向的对话系统在我们的生活中越来越受欢迎,提出并探索了更现实的任务。然而,出现了新的实际挑战。例如,由于在现有公共数据集中缺少这种情况,当前对话系统无法在查询数据库时有效处理多个搜索结果。在本文中,我们提出了数据库搜索结果(DSR)歧义,这是一个专注于消除数据库搜索结果的新任务,这通过允许它们从多个选项中选择了多个选项而不是只有一个来增强用户体验。为研究这项任务,我们增强了受到流行的面向任务的对话数据集(Multimoz和SGD),转弯,由(a)通过预定义的语法和(b)为子集收集人类释义的(b)来解析歧义。我们发现,我们的增强对话数据的培训提高了模型处理模糊方案的能力,而不会牺牲未修改的转弯。此外,即使在没有域名数据的情况下,也有助于我们的模型帮助我们的模型提高DSR消歧的性能,表明它可以被学习为普遍对话技能。我们的数据和代码将公开可用。
translated by 谷歌翻译